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Abstract 

Newtonian dynamics is derived from prior information codified into 
an appropriate statistical model. The basic assumption is that there is 
an irreducible uncertainty in the location of particles so that the state 
of a particle is defined by a probability distribution. The corresponding 
configuration space is a statistical manifold the geometry of which is de- 
fined by the information metric. The trajectory follows from a principle 
of inference, the method of Maximum Entropy. No additional "physical" 
postulates such as an equation of motion, or an action principle, nor the 
concepts of momentum and of phase space, not even the notion of time, 
need to be postulated. The resulting entropic dynamics reproduces the 
Newtonian dynamics of any number of particles interacting among them- 
selves and with external fields. Both the mass of the particles and their 
interactions are explained as a consequence of the underlying statistical 
manifold. 

1 Introduction 

It is widely assumed that geometry is useful because it describes properties 
of the real world. Indeed, Euclidean geometry may very well have been the 
first successful physics theory, the first example of a "law of nature" . Later 
developments such as Riemannian geometry and the theory of fiber bundles 
have only strengthened this conception: geometry works because it lies at the 
very core of physics. Thus, it may be surprising, at least at first sight, to 
find that the same methods of geometry have also turned out to be useful in 
statistical inference, a separate field that makes no claims to authority on natural 
phenomena. It could just be a coincidence but perhaps it is not. 

Perhaps the laws of physics are deeply geometrical because they are practi- 
cal rules to process information about the world and geometry is the uniquely 
natural tool to do just that. This notion, that the laws of physics are not laws of 
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nature hut rules of inference, seems outrageous but deserves serious attention. 
The evidence supporting it is already considerable. Indeed, most of the formal 
structure of statistical mechanics [T] and of quantum theory [5] can already be 
derived from principles of inference (consistency, probabilities, entropy, etc.). 

The objective of this paper is to use well established principles of inference 
to derive Newtonian dynamics from relevant prior information codified into a 
statistical model. The challenge, of course, is to accomplish this task without 
assuming what we want to derive. One must not assume equations of motion or 
principles of least action, and in particular, one must not assume the concept 
of momentum and the associated phase space, and not even the notion of an 
absolute Newtonian time. 

The first step is to construct a suitable statistical model of the space of 
states of a system of particles. A most remarkable fact is that the statistical 
configuration space is automatically endowed with a geometry and that this 
"information" geometry turns out to be unique [3] [4j . 

Next we tackle the dynamics: Given the initial and the final states, what 
trajectory is the system expected to follow? In the usual approach one pos- 
tulates an equation of motion or an action principle that presumably reflects 
a "law of nature." For us the dynamics follows from a principle of inference, 
the method of Maximum Entropy, and we show that with a suitable choice 
of the statistical manifold the resulting "entropic dynamics" [S][B] reproduces 
Newtonian dynamics. 

The entropic dynamics approach allows us to see familiar notions such as 
time, mass and interactions from an unfamiliarly fresh perspective. For example, 
there is no reference to an external time but there is an internal "intrinsic" time 
that is a measure of the change of the system itself. Thus, the Newtonian 
universe turns out to be its own clock, and the familiar Newtonian time is 
not particularly fundamental but merely a convenient definition designed to 
make motion look as simple as possible. Both the mass of the particles and 
their interactions are explained in terms of an irreducible uncertainty of their 
positions; they are features of the underlying statistical manifold. 

2 Configuration space as a statistical manifold 

Let us start with a single particle moving in space: the configuration space is a 
three dimensional manifold with some unknown metric tensor gij{x). Our main 
assumption is that there is a certain fuzziness to space; there is an irreducible 
uncertainty in the location of the particle. Thus, when we say the particle is 
at the point x what we mean is that its "true" position y is somewhere in the 
vicinity of x. This leads us to associate a probability distribution p{y\x) to 
each point x and the configuration space is thus transformed into a statistical 
manifold: a point x is no longer a structureless dot but a probability distribution. 

Remarkably there is a unique measure of the extent to which the distribution 
at X can be distinguished from the neighboring distribution at a; + dx. It is the 
information metric of Fisher and Rao [3j. Thus, physical space, when viewed as 
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a statistical manifold, inherits a metric structure from the distributions p{y\x). 
We will assume that the originally unspecified metric gij{x) is precisely the 
information metric induced by the distributions p{y\x). 
In [Hj we proposed that a Gaussian model0 



= exp 



(1) 



where 7 = det , incorporates the physically relevant information which con- 
sists of an estimate of the particle position, 

{y^)^Jdypiy\x)f = x\ (2) 

and of its uncertainty given by the covariance matrix, 

{{y' - x'Ky' - x^) = J dyp{y\xW " " x') = f i^) , (3) 

where 7*-' is the inverse of 7^, = 

Unfortunately the expected values in eqs.([5]) and ^ are not covariant under 
coordinate transformations. Indeed, the transformation y'^ ~ /'(?/) does not 
lead to 2:" — /'(x) because in general {f{y)) ^ fiiu)) except when uncertainties 
are small. Our Gaussian model can at best be an approximation valid when 
p{y\x) is sharply localized in a very small region within which curvature effects 
are negligible. Fortunately this is all we need for our present purpose. 

[As an interesting aside we note that it is possible to devise fully covariant 
models. Here is an example: Let 7^(2;) be a positive definite tensor field and 
let us use it as if it were a metric tensor, di'^ = -f^jdx^dx^. Let i{x,y) be the 
7-length along the 7-geodesic from the point x to the point y. The proposed 
distribution is 

p(y|x) = l7^/2(y)exp-^^, (4) 

which is a manifestly covariant object: the normalization constant the 7- 
length £{x,y), the scalar field cr(x), and dyj^/^{y) are all invariants. From 
this model we can compute a second metric, the information metric gij, which 
need not in general coincide with 7^ . In the limit of small uncertainties (after 
absorbing a into 7^) one recovers ecj^.p]).] 

3 The Information Metric 

The information distance between p{y\9) and p{y\9 + dO) where the 0°' are pa- 
rameters is calculated from (see e.g., [3]) 

^0-2 n M^Mb -^-u n f^f iQ-,9logp{y\e) dlogp{y\0) 

de ^Gabde dO with Gat = J dyp[y\e) . (5) 



^We adopt the standard summation convention: repeated indices are summed over. 
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Consider the 9-dimcnsional space of Gaussians 

1 



-^1/2 



(6) 



Here the parameters 0° include the three plus six independent elements of the 
symmetric matrix 7^^. Eq.([5]) gives the information distance between p{y\x,j) 
and p{y\x + dx, 7 + ^7) as 

df = G^jdx'dx^ + G'^dj.jdx'' + G'^ ^^d-i,jd-iki , (7) 

where 

G^J=1,,. G'^=0, and G^^' = i(f V + r¥') • (8) 
(7''= is the inverse of 7 Therefore, 

df = ^.^dx^dx^ + I7''=7^'d7^^.d7,; . (9) 

This is the metric of the full 9-dimensional manifold, but it is not what we need. 

What we want is the metric of the embedded 3-dimensional submanifold 
where 7^ — 7^ {x) is some function of x. To find the induced metric we cannot 
just substitute ^7^ = 9fe7y dx^ into eq.([9]) because under a change of coordi- 
nates dx^ transforms as a tensor but the ordinary derivative 9fc7jj does not. In a 
model of physical space the i indices in cannot be treated independently from 
the ij indices that appear in 7^ because any transformation that changes the 
also changes the 7,^ . Accordingly, we require that ^7^^ = Vfe7y dx^ where Vfe 
is the covariant derivative and the corresponding induced information metric is 

9^J = + ^7°^7'"V,7a6V,7cd • (10) 

Normally one is given a manifold of probability distributions and the problem 
is to find the corresponding information metric. In order to do physics we are 
also concerned with the inverse problem: we want to design statistical manifolds 
with the appropriate geometries. We want to find the covariance field tensor 
7y(a;) that leads to a given metric tensor gij[x). Thus, we regard eq. pO]) as 
a set of differential equations for ^ij{x). Since V^^y = 00 a straightforward 
substitution shows that the solution is 

itji^) = gi]{x) ■ (11) 

In words: information distance is measured in units of the local uncertainty. 
This beautifully simple but non-trivial result is valid in the low uncertainty 
regime where eq.(IT]) holds. The uniqueness of the solution and whether it 
also holds in high curvature regions, such as near singularities, remains to be 
ascertained. 



The choice of the Levi-Civita connection is justified in the next section. 
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4 Entropic Dynamics for a single particle 



The key to the question "Given an initial and a final state, what trajectory 
is the system expected to follow?" lies in the implicit assumption that there 
exists a continuous trajectory. A large change is the result of a succession of 
very many small changes and therefore we only need to determine what a short 
segment of the trajectory looks like. The idea behind entropic dynamics is that 
as the system moves from a point a; to a neighboring point x + Ax it must pass 
through a halfway point [5]. 

The basic dynamical question can now be rephrased as follows: The system 
is initially described by the probability distribution p{y\x) and we are given the 
information that it has moved to one of the neighboring states in the family 
p{y\x') where the x' lie on the plane halfway between the initial x and the 
final X + Ax. Which p(y\x') do we select? The answer is given by the method 
of maximum (relative) entropy, ME. The selected distribution is that which 
maximizes the entropy of p{y\x') relative to the prior p{y\x) subject to the 
constraint that x' is equidistant from x and x + Ax. The result is that the 
selected x' minimizes the distance to x and therefore the three points x, x' and 
X + Ax lie on a straight line. 

Since any three neighboring points along the trajectory must line up, the 
trajectory predicted by entropic dynamics is the geodesic that minimizes the 
length 

J = J dX [g^jx'x^] ' with i' = ^ , (12) 

where A is any parameter that labels points along the curve, x^ — x'(A). 

Incidentally, note that in entropic dynamics there is one family of curves 
that is singled out as special: these are the minimal-length geodesies. From 
the purpose of building useful physics models no additional structure is needed 
and thus none will be introduced. It is therefore natural to use this same 
family of curves to define the notion of parallelism: the minimal- length geodesies 
are defined to be the straightest curves. This definition leads to the Levi- 
Civita connection which is equivalent to the condition Vfc.gy = assumed in 
the previous section. (See e.g. ^) 

The simplest statistical model is a three-dimensional manifold of spherically 
symmetric Gaussians with constant variance CTq. The corresponding information 
metric is 

9^^=^^ = ^^^^ > (13) 

which we recognize as the familiar metric of flat Euclidean space. It is reassuring 
that already in such a simple model entropic dynamics reproduces the familiar 
straight line trajectories that are commonly associated with Galilean inertial 
motion. But this is too simple; non-trivial dynamics requires some curvature. 

We are thus led to consider a slightly more complicated model of spherically 
symmetric Gaussians where the variance is a non-uniform scalar field ix) . It 
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is convenient to write the corresponding information metric as the Euchdean 
metric ea. (fT3)) modulated by a (positive) conformal factor ^{x), 



$(a;) 



2 ^ij ) 



(14) 



with a^{x) = CTg/$(a;)|l 

It is convenient to rewrite the length eq. with the metric in the form 



J = 2^/2 / d\L{x,x) , 
with a "Lagrangian" function 

L{x,x) ^ [<P{x)Tx{x)]^^^ with rA(i) 
The geodesies follow from the Lagrange equations, 



2a] 



2 ^ ij X X"^ 



or 



1/2 



d 5L 


dL 








dx^ 






d 


\(^ 








dX 






d\ 


dx^ 



(15) 
(16) 

(17) 
(18) 



These rather formidable equations can be simplified considerably once we notice 
that the parameter A is quite arbitrary. Let us replace the original A with a new 
parameter t given by 



"'-It 



1/2 



dA or — = — — 



dt 



1/2 



d\ 



In terms of the new t the equation of motion simplifies to 

1 d^x' _ a$ 

From eq. fTSl) the new t is such that 

2 



$ = T; 



Tf where T* 



1 ^ dx^ dx^ 
2al '^~dt~dt 



(19) 



(20) 



(21) 



Eqs. ipn]) and (PT|) are equivalent to Newtonian dynamics. To make it explicit 
we introduce a "mass" m and a "potential" (j){x) through a mere change of 
notation, 

and <^{x) = ~4){x) + E (22) 







•^The effect of ^(x) is a local dilation. Since each side of a small triangle at x is dilated by 
the same factor <I>(x) its angles remain unchanged. Such angle-preserving transformations are 
called conformal. 



6 



where the constant E reflects the freedom to add a constant to the potential. 
The result is Newton's equation, 



m 



d(t> 



(23) 



and energy conservation, 




(24) 



Thus, the constant E is interpreted as energy. 

We have just derived F = ma purely from principles of inference applied to 
the relevant information codified into a statistical model! From eq. (|12p onwards 
our inference approach is formally identical to the Jacobi action principle of 
classical mechanics [8] but we did not need to know this. Indeed, by a wild 
stretch of our historical imagination it is perhaps conceivable that had Newton, 
Lagrange, and Jacobi known less physics and much more inference they might 
have invented their subject along these lines. Had history actually followed 
this unlikely course we might not have used the notions of mass m or potential 
</)(a;) and instead we would have referred to the particle's "intrinsic" position 
uncertainty o-q, and how it is modulated throughout space by the field $(a;). 

The derivation above serves to illustrate the main idea but suffers from two 
important limitations. First, it applies to a single particle with a fixed constant 
energy and this means that we deal with an isolated system. Second, while 
it is true that we have identified a convenient and very suggestive parameter 
how do we know that it actually represents "true" time? Is t the universal 
Newtonian time or just a parameter that applies only to one particular isolated 
particle? The original formulation in terms of the "Jacobi" action, eq. (fT5|) . is 
completely timeless; how and where did time sneak in? 

The solution to both these problems emerges as we apply the formalism 
to the motion of the only system known to be completely isolated: the whole 
universe. Then the fact that the energy is a fixed constant does not represent a 
restriction. And further, since the preferred time parameter would be associated 
to the whole universe, it would not be at all inappropriate to call it the universal 
time. 

5 The whole universe: many particles 

To simplify our notation we will consider a universe that consists of = 2 
particles. The generalization to arbitrary N is trivial. For the 2-particle system 
the position x = {xx,X2) is denoted by 6 coordinates x^ with A — 1,2, .. .6. 
Let x^ = [x^^ , x'^) with ii — 1, 2, 3 for particle 1 and 12 = 4, 5, 6 for particle 2. 
A point in the N = 2 configuration space is a Gaussian distribution. 



p{y\x) 



exp -^lAB{x){y' 



,A 



x^){y 




(25) 



(27r)3/2 
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The simplest model for two (possibly non-identical) particles assigns uniform 
variances and to each particle. The corresponding metric, analogous to 
eq.([l31), is 

9ab = Iab = ^AB , (26) 
where ruAB is a constant 6x6 diagonal matrix, 



mAB = 



<5i2j2/'^2 



(27) 



where each entry represents a 3 x 3 matrix. The metric mAB describes a flat 
space; the trajectories are familiar "straight" lines and the particles move inde- 
pendently of each other; they do not interact. As before, non-trivial dynamics 
requires the introduction of curvature and the simplest way to do this is through 
an overall conformal field $(a;) with x = (x\^X2). Thus we propose 

gAB{x) = 1ab{^) = ^{x)mAB ■ (28) 
The equation of motion for the N = 2 universe is the geodesic that minimizes 

J = 2^''^ J d\L{xi,X2,±i,X2) , (29) 

where 

L{x,x) ^[<P{x)Tx{x)]^^^ and Txix) ^ ^mABx'^i^ ■ (30) 



The Lagrange equations yield, 

5$ 



mAB 



TxJ d\ 



$ \ dx^ 



TxJ d\ 



dx^ 



(31) 



which suggests introducing a new parameter t defined by 

In terms of the new parameter the equations of motion are 

d^x^ 5$ 

which, since ttiab is a diagonal matrix, is 

1 d^x'" d 



a'l dt^ dx^ 



Hxi,X2) , (34) 



for each of the particles, n = 1, 2. Note that the motion of particle 1 depends 
on the location of particle 2: these are interacting particles! 
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The new time parameter t, ea. p2p . is such that 



$^T.(^-j =T, where — — . (35) 

As before, the equivalence to Newtonian dynamics is made exphcit by a change 
of notation, 

4^- = m„ and = ~4>(x) + E . (36) 

The result is 

d^x*" 5 1 dx^ dx^ 

The constant is the total energy of the universe and there are no restrictions 
on the energy of individual subsystems. 

For the conformal factor <^(xi,X2) we can choose anything we want. For 
example, 

^{xi,X2) = -Vi{xi) - V2{X2) - u{xi,X2) + E , (38) 

so the particles can interact with external potentials Vi and V2 and also with 
each other through u{xi,X2). 

The definition of time t required taking into account all the particles in the 
universe. This is in accord with the ephemeris time defined by astronomers. We 
started with a completely timeless theory, eq. ()29p . and in fact, no external time 
has been introduced. What we have is a convenient t parameter associated to 
the change of the total system, which in this case is the whole universe. The 
universe is its own clock; it measures universal time. Incidentally, note that the 
reparametrization that allowed us to introduce a Newtonian time was possible 
only because the same conformal factor <^(x) applies equally to all particles. 

Entropic dynamics offers a new perspective on the concepts of mass and 
interactions. To see this note that since is diagonal the distribution (|25p 
turns out to be a product, 

p{y\x) ^p{yi\xi,X2)p{y2\xi,X2) ■ (39) 

Note that although the model represents interacting particles the distribution is 
a product: the uncertain variables yi and j/2 are statistically independent. The 
coupling arises through conditioning on a; = {xi,X2)- 

Let us focus our attention on particle 1; similar remarks also apply to particle 
2. The distribution p(?/i |a;i, X2) is a spherically symmetric Gaussian, 



p{yi\xi,X2) oc exp 



^ J zjiy" - x'){y^ - x^) 



(40) 



2al{xi,X2) 

The uncertainty in the position of particle 1 is given by 

ai{xi,X2) = [$(a;i,a;2)mi]~^/^ . (41) 
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The mass mi is interpreted in terms of a uniform background contribution to the 
uncertainty. Mass is a manifestation of an uncertainty in location; higher mass 
reflects a lower uncertainty. On the other hand, interactions arise from the non- 
uniformity of ai{xi,X2) that depends on the location of other particles through 
the modulating field ^{xi,X2)- It is worthwhile to note that even though this 
is a non-relativistic model there already appears a "unification" between mass 
and (potential) energy: they are different aspects of the same thing, the position 
uncertainty. 

6 Final remarks 

We emphasize that the model we have proposed does not take into account all 
the dynamical information that we know is relevant - relativistic and quantum 
effects have not been included. Our model is very restricted. For example, 
our model invokes two apparently unrelated metrics. There is the metric Sij 
of flat 3-dimensional Euclidean space that appears in the kinetic energies and 
there is the information metric gij that accounts for mass and interactions and 
applies to the curved configuration space. This is a refiection of the fact that a 
system of N particles is described as a point in a SA'^-dimensional configuration 
space. A better model would have N points living within the same evolving 
3-dimensional space. 

Furthermore, we have not provided any rationale for how to choose the mod- 
ulating field ^{x). Just as Newton deliberately refrained from explaining the 
origin of his inverse square forces - hypothesis non jingo - so have we refrained 
from offering any physical hypothesis about the underlying fuzziness of space. 
It is reasonable to expect that a derivation of general relativity as an example of 
entropic dynamics would yield important insights on this matter. Preliminary 
steps in this direction appeared in [6] . 

What we have done is to show, by exhibiting an explicit example, that 
the tools of inference - probability, information geometry and entropy - are 
sufficiently rich that one can construct entropic dynamics models that reproduce 
recognizable laws of physics. Perhaps all laws of physics can be derived in this 
way. 
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